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ABSTRACT 





This final report summarizes past research and 
suggests new approaches to the problem of estimating long-term 
individual constants using path analysis. The general objective of 
the research was to detect and measure the likelihood that one 
variable, x # measured at time, t, has a causal influence on another 
variable, y, measured at a subsequent time t + k {k being the 
measurement interval) • Work on this problem continues under a grant 
from the National Science Foundation. (CK) 
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FIGURES & TABIES 

Figure 1. Auto- and cross-correlograms generated by 

"example 1" in text. Variables x and y each influence 

the other by the same amount* but the x s-y influence 

is distributed over 5 lags* whereas the y — - » x influence 
is concentrated at one lag. 7 

Figure 2. Auto- and cros^* -correlations generated by "example 
2". The y — » x influence is twice as strong as the 
x *>y, but the eorrsXogram for the latter is higher be- 

cause of the high autoregressive stability of dependent 
variable y. 8 

Figure 3. Path model of an autocorrelated variable with meas- 
urement error and long-term individual constants. 12 

Table 1. True path coefficients compared with regression 

coefficients based on theoretically derived correlations* 
for two examples. 10 
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Overview 



The present report is a final one ouiy in the sense of being the 
lest written under the present grant from the U.S. Office of Education. 
Work on the same problem continues under a grant from the National Sci- 
ence Foundation, with the same general objective: given a set of panel 

data with the same variables remeasured at several intervals on a given 
set of individuals (such as measures of educational motivation and per- 
formance), it it possible (a) to detect and (b) to measure the likelihood 
that one variable x measured at time t has 2 . causal influence on another 
variable y measured at a subsequent time t + k being v*he measurement 
interval)? 

While some progress has been made, it is not sufficient as yet, 
in our view, to warrant a comprehensive technical treatment (recapitu- 
lation of objectives, methods, hypotheses, conclusions). Therefore the 
following section will summarize what has been presented in interim 
reports, to which the reader is referred for further detail. 

Additional work is then reported, using techniques of path analysis 
on the problem of predicting autocorrelations and lagged cross -correla- 
tions in a two-variable system in which causal influence is exerted in 
either direction o ver several causal intervals. 

A simple method Is described and Illustrated for recovering the 
underlying path coefficients (including causal coefficients) from re- 
gression analysis of the auto- and cross-correlations. 

A final section suggests a new approach, using path analysis, to 
the problem of estimating long-term individual constants. Future steps 
are indicated. 

Summary of work from September 1969 
through December 1970 

We began by obtaining two data bankfs with repeated measurements: 
height, weight, and grip of 100 boys and 100 girls measured at 6-month 
intervals between ages 6 and 9; and 3,000 students measured by the 
Educational Testing Service in grades 5, 7, 9, and 11 with complete data 
for four years on test batteries SCAT and STEP (aptitude and performance 
measures) and for three years on BEQ (questionnaire it an 3 on interest and 
behavior). 

Previous work with two-variable simulated time series (Felz, 

Mag liv eras, and Lew, 1968) suggested that the appearance of causal 
connections between two variables would be obscured by the tendency for 
each individual to remain relatively stable on each variable. Such 
stable long-term trends may be conceptualized as individual constants 
around which short-term disturbances occur. The extremely high cor- 
relations among successive height and weight scores suggested the 
presence of such individual constants, rising of course with time with 



each Individual retaining the same relative position* 

Several months of effort were spent in devising a means for esti- 
mating such a constant for each individual so that it could be removed. 

The first progress reports under this grant (see list of references, Pelz 
1969-70) describe .hese efforts, particularly #3 for April-June 1970, 
which deals in depth with an attempt to separate empirical height and 
weight data into stable long-term trends and relatively unstable but auto- 
correlated short-term disturbances. A relatively complex computerized 
method is given for making such a separation, subject to some restrictive 
assumptions placed on the underlying model as well as on the number of 
Mme periods at which measurements have been taken. 

The method appears conceptually sound,. However, it assumes that 
the variable is causally independent; we have not yet devised a procedure 
relevant for a dependent variable. Furthermore, when residual scores were 
obtained by subtracting the estimated individual constants at each time 
period, cross -correlations among the residuals did not lend themselves 
to a simple interpretation of caxtsal influence among height, weight, and 
grip. 

Accordingly the last six months of 1970 were spent in a different 
approach, as described in progress reports #4 and #5, in which we moved 
away from consideration of empirical data and explored mathematical 
models of hypothetical causal structures. This work resulted in a paper 
(Pel* and Faith, 1970) presented at the American Statistical Association 
meetings in Detroit, December 27, 1970. We coupled the methods of path 
analysis with those of matrix theory to give a more compact form of the 
two^varlable unidirectional causal scheme, and thereby greatly simplified 
the derivation of the correlational properties of such a hypothetical 
model. A detailed exposition of this is to be found in the technical 
appendix of the ASA paper. 

Reciprocal causation and distributed lags 

After completion of the ASA paper, our attention turned to some- 
what more complex situatlons--8pocifically these in which the causal in- 
fluences between two variables x and y were reciprocal (l.e. x y and 
y > x ) and in which these causal influences were exerte not over one 
single time Interval but rather were distributed over several time periods. 
We found that, with slight modifications, the methods employed in study- 
ing the simple model described above were applicable for these more com- 
plex situations. The basic structure of the problem turned out to be 
very much the same, and although it is more complex, the problem is of 
no greater depth. 

We define the model as follows. Corresponding to the recursive 
relations (3) and (4) in the appendix of Pel 2 and Faith (1970) are: 
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To get this model started requires the inclusion of the correlated 
inputs x and y , where s takes on the values -1, -2, ...» -G. As 

before, this model is stationary with respect to translation in the 
time domain if the correlations between the inputs satisfy certain 
conditions, which in this case take the form of a system of linear 
equations* Solving this system permits one to determine all of the 
correlations of the more complex model, since all of these are determined 
by the input correlations. 

As in the simpler case, it is possible to generate a theoretically 
impossible model, if the path coefficients are too large* A simple 
test was derived, similar to that used in the proof of (15b) in the 
appendix, which will notify one of this situation if it arises. 

Using the above procedure we have ill fact been successful in . 
generating theoretical correlations for the two-way distributed- lag 
model (see Pelz, bagliverac, and Lew, 1968) using the same parameters* 

Illustration with two examples 

Given below are two semple test runs of our procedure. In each 
of them the influence is exerted in both directions. In the first ex- 
ample the two variables are identical except that the influence of x on 
y is distributed over 5 lags (i.e. intervals of 1, 2, 3, 4, and 5 time 
units) whereas the influence of y on x is concentrated at one lag (3 
time units), the total amount of influence being the same in each case. 

A partial diagram of the path model (many of the paths being omitted 
for simplicity) is : 



^ X, 5*- X,- > X, 3. X_ X, V 



^3 
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The parameters of the first model are: 



P. 




7 



xx 




• 25 for i *» 3 
0. for i ^ 3 



P. 



- .05 for i « 1,2, 3,4, 5; £ = .25 



The latter notation may be read: "influence on y exerted by x over time 

lags of 1, 2, •••" The auto- and cross-correlograms resulting from this 
model are platted in Figure 1. 

Note that in the left haj.f of the cross-correlogram, the effect 
of concentrating the causal influence y — * > x at a single lag is to make 
the peak higher and sharper; in the right half, the effect of distributing 
the causai influence x ■ >y over 5 lags is to make the peak lower and 



is considerably higher than the autoregressive coefficient p for the 

other variable. Each variable now Influences the other over 
five separate causal intervals, and the total influence of y on x is 
twice as great as the total influence of x on y. The parameters of this 
model are set es follows: 



p w = .04 for i = 1,2, 3,4, 5; ]T = .20 

xy ± 

p = .02 for i « 1,2, 3, 4, 5; J[ = .10 

The resulting auto- and cross- correlations are plotted in Figure 2. 



wider. 



Figure 1 here 



In the second example, the autoregressive path coefficient p 





Figure 2 here 
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Measurement interval k Measurement interval k 

Figure 1. Auto- and cross- correlograias generated by "example 1" in text. Variables 
x and y each influence the otiisr by the same amount, but the x-^y influence is 
distributed over 5 lags, whereas the y— >x influence is concentrated at one lag. 
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Since the causal influence y " •> x is twice that of x — *-y, one 
would normally expect the cross- correlogr am to .be higher in the left 
half than in the right half* As shown in Figure 2, this is not the case. 
The reason for this apparent paradox lies in the fact that the y variable, 
because of its extremely high autoregressive coefficient, has a high 
"memory" of past influences from x, which therefore accumulate over tine 
and increase the cross-correlations. Variable x on the other hand, with 
a relatively small autoregressive coefficient, soon "forgets" past in- 
fluences from y. 

The "recovery" process 

The apparent paradox in Figure 2 now raises an important question: 
is it possible to analyze the auto- and cross- correlogr ams in such a 
way as to recover the true values of the various path coefficients, show- 
ing that the causal effect of y — ► x is in fact stronger than x--»y, 
despite the visual evidence to the contrary? It turns out that this ob- 
jective can be easily accomplished by applying linear multiple regression 
to the correlational matrices, to express each of the variables as a 
function of previous variables. The regression (beta) coefficients for 
each predictor are then theoretically equivalent to the path coefficients 
between that predictor and the particular dependent variable. In the 
absence of extraneous factors such as measurement error or long-term 
stability in equations (1) and (2), the variables x and y are defined 
at each point in time by regression equations in which the predictors 
are the same variables measured at previous points in time. 

As shown in T^ible 1, the resulting regression coefficients were 
very close to the values of the path coefficients used to specify the 
model. 



Table 1. True path coefficients compared with regression co- 
efficients based on theoretically derived correlations, 
for two examples. 



Parameter 



Example 1 



True value 



Regression 

coefficient 



Example 2 



True value 



Regression 

coefficient 



p 

r xx 


.70 


.70 




.50 




.50 


p 

yy 


.70 


.70 




.94 




.92 


p w 


.00 


.00 




.04 




.048 


xy-i 














V, 


.00 


.00 




.04 




-.009 


p z 


.25 


.25 




.04 


.20 


.803 


p 


.00 


.00 




.04 




.051 


















.00 


.00 




.04 




.018 


xy 5 








J 






p yx t 


•05 ' 


.oss*'! 


1 

j 


.02] 


t 


.018 


Pyx 2 


.05 


- 045 ' 


- 1- 


.02 1 


r I = 


.023 


p 

yx-. 


.05 


.25 ,05 ° 


.251 


.02 


.10 


.019 


p__. 


.05 


.053 




.02 




.019 


yx 4 














p w 


.05 


.048 




.02 




.021 


y x 5 


J 


J 




J 







1 = 
.209 






I- 

.100 



In example 1 all of the path coefficients were est ima ted with 

rather high accuracy. There were slight errors in estimating the various 

p » but the sum of these estimates (regression coefficients) was very 
yx. 



close to the sum of the true values. 
Inaccuracies (such as rounding). 



Such errors arise from computational 

In e x a mp le 2 some inac curacies appeared, chiefly in estimating 

and p (but note that the sum of the latter estimates was reasonably 
^i 

close to the sum of the true values). Hence it may be difficult to get 
accurate estimates of causal coefficients for a highly autocorr elated 
variable functioning as an independent variable. As mentioned previously, 
this situation gives rise to an oddly- shaped cross-correlogram. 
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Unresolved problem--estimating long- iiorm tendencies 



The chief unresolved problem for this project has been the one 
Indicated In section 1 of the progress report for April-June, 1970--that 
is, the separation of variables Into long-term and short-term components. 
In that report is explained how such a separation can be accomplished 
for each variable In isolation. Once this is accomplished, the short- 
term components of the variables are compared to determine causal infer- 
ences. Unfortunately such a two-stage procedure has the disadvantage 
that the two stages make different assumptions about the nature of 
the variables. The first stage treats each of them as simple auto- 
regressive panels, each causally independent, whereas the second re- 
gards them as interacting with each other. 



simultaneously. Work in this direction is not complete, although some 
progress has been made by formulating the problem in terms of path 
analysis, thereby reducing its complexity considerably. For example, 
a much simpler and more direct solution has been discovered for the 
problem of separating a single variable into short and long term com- 
ponents, by employing path analysis. Consider the following path model 
diagrammed in Figure 3, and defined by the equations: 



One solution is to find some way to estimate all parameters 




t « 1,2 



> • • • 



(4) X 



t 




t = 0,1,2 



) • • • 



Figure 3 here 
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Figure 3. Path model of an autocorr elated variable with 

measurement error and long-term individual constants. 



to Figure 3 the terras x^ represent true scores, measured scores, 
e t measurement error, and z represents the long- term constant for each 
individual. The terms x Q , e Q , z, u fc , and e fc t - 1, 2, ... are uncorrelated 
inputs to the system, and the variables u fc are the disturbances in the 
short-term component x t » The theoretical autocorrelations for this model 
are given by the relation 



(5) r(X s ,X t ) 



|t-s| 

P xx p Xx P Xx 
s t 



+ 




9 



where s, t =* any pair of times. 

Since the correlations are non-linear functions of the path para- 
meters, the problem of recovering these is a non-trivial one. In fact, 
if P-__ is one, it is impossible to solve uniquely for all the parameters, 

as may be seen by observing equation (5). 
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Even when this is not the case it is not known yet under pre- 
cisely what conditions a unique solution is obtainable, and more work 
is required in this area. So far we are able to provide an estimate 
of the parameters if there are at least six measurements over time 
of the variable X^., The uniqueness of this solution is presently open 

to doubt, as is its sensitivity to sampling errors in the empirical 
correlations • 

The problem is simplified somewhat if some additional constraint 
is placed on the model. For example, if it is assumed that the in- 
fluence of measurement error is constant over time- - i. e. that 

p Xes " p xet * p xe for a11 8 and — then it: appears that four measure- 
ments are sufficient to solve for the path parameters. Again, the 
existence and uniqueness of this solution requires further study. 

Even if equations (3) and (4) can be solved for the path co- 
efficients, however, the most interesting questions, those concerning 
the causal relations between the variable x and other variables, are 
left unanswered, since this model describes only one variable, inde- 
pendent of the outside causal factors. Although the solution to the 
more complex problem is not at hand, it would appear that the most 
direct approach to it would be by means of path analysis, because of 
the degree to which this simplifies the statement of most linear causal 
models. 

Future steps 

Work is continuing in the further generalization of the linear 
causal model described above and in the ASA paper (Pelz and Faith, 1970). 
In particular it appears possible to compute the correlations that will 
arise in a model consisting of an arbitrary number of variables which 
causally influence each other over an arbitrary number of causal in- 
tervals. The theoretical concepts involved are the same as those em- 
ployed for the two-variable model with distributed lags and reciprocal 
causation. Only the mechanics of the c output at ion are more elaborate, 
because of the great increase in the number of equations to be solved 
for as the number of variables in the model increases. 
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